WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 j 
H04L 12/56 



Al 



(11) International Publication Number: WO 99/57858 

(43) International Publication Date: 1 1 November 1999 (11.11.99) 



(21) International Application Number: PCT/US99/09853 

(22) International Filing Date: 5 May 1999 (05.05.99) 



(30) Priority Data: 
09/074,059 



7 May 1998 (07.05.98) 



US 



(71) Applicant: CABLETRON SYSTEMS, INC. [US/US]; 35 

Industrial Way, Rochester, NH 03867 (US). 

(72) Inventors: DONIS, Marc; 1225 S.W. First Avenue #423, 

Gainesville, FL 32601 (US). LEWIS, Lundy; 480 
Greenville Road, Mason, NH 03048 (US). DATTA, Utpal; 
52 Pinecrest Drive, Bedford, NH 03 1 10 (US). 

(74) Agent: HENDRICKS, Therese, A.; Wolf, Greenfield & Sacks, 
P.C., 600 Atlantic Avenue, Boston, MA 02210 (US). 



(81) Designated States: AU, CA, European patent (AT, BE, CH, 
CY, DE, DK. ES, FI, FR, GB, GR. IE, IT, LU, MC. NL, 
PT, SE). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Title: MULTIPLE PRIORITY BUFFERING IN A COMPUTER NETWORK 



,43a 




Queue 1 V Depth D1 
Queue 2, Depth D2 



Queue 3, Depth D3 



Queue 4, Depth D4 




40 



(57) Abstract 



Buffer element for communication network, including a first buffer memory to store communication units corresponding to a first 
quality of service (QOS) level, and a second buffer memory to store communication units corresponding to a second quality of service 
level. A buffer manager selectively stores communication units from the first and second buffers based on the corresponding quality of 
service level, and retrieves communication units from the first and second buffer memories. The buffer manager includes a sorter unit for 
selectively storing based on the quality of service level. The buffer element may further include a depth adjuster to adjust the depth of the 
first and second buffer memory. 
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MULTIPLE PRIORITY BUFFERING IN A COMPUTER NETWORK 

Field of the Invention 

The invention relates to communication networks and, more particularly, to buffering 
5 received and/or transmitted communication units in a communications network. 

Discussion of the Related Art 

Communication networks have proliferated to enable sharing of resources over a 
computer network and to enable communications between facilities. A tremendous variety of 

10 networks have developed. They may be formed using a variety of different inter-connection 
elements, such as unshielded twisted pair cables, shield twisted pair cables, shielded cable, 
fiber optic cable, even wireless inter-connect elements and others. The configuration of these 
inter-connection elements, and the interfaces for accessing the communication medium, may 
follow one or more of many topologies (such as star, ring or bus). A variety of different 

15 protocols for accessing networking medium have also evolved. 

A communication network may include a variety of devices (or "switches") for 
directing traffic across the network. One form of communication network using switches is 
an Asynchronous Transfer Mode (ATM) network. These networks route "cells" of 
communication information across the network. (While the invention may be discussed in 

20 the context of ATM networks and cells, this is not intended as limiting.) 

FIG. 1 is a block diagram of one embodiment of a network switch 10. In this 
particular example, the network switch has three input ports 14a- 14c and three output ports 
14d-14f. The switch is a unidirectional switch, i.e., data flows only in one direction - from 
ports 14a- 14c to ports 14d-14f. A communication unit (such as an ATM cell, data packet or 

25 the like) may be received on one of the ports (e.g., port 1 4a) and transmitted to any of the 
output ports (e.g., port 14e). The selection of which output port the communication unit 
should receive the communication unit may depend on the ultimate destination of the 
communication unit (and may also depend on the source of the communication unit, in some 
networks). 

30 Control units 1 6a- 1 6c route communication units received on the input ports 1 4a- 1 4c 

through a switch fabric 12 to the applicable output ports 14d-14f. For example, a 
communication unit may be received on port 14a. The control unit 16a may route the 
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communication unit (based, for example, on a destination address contained in the 
communication unit) through the switch fabric 12 to the buffer 16e. From there, the 
communication unit is output on port 14e. 

The buffers 1 6d-l 6f permit the network switch 1 0 to reconcile varying rates of 
5 receiving cells. For example, if a number of cells are received on the various ports 1 4a -1 4c, 
all for the same output port 14d, the output port 14d may not be able to transmit the 
communication units as quickly as they are received. Accordingly, these units may be 
buffered. 

A great number of variations on the network switch 10 illustrated in FIG. 1 are 

10 possible. For example, control unit 16a- 16c may be done in a centralized manner. As another 
example, the buffer in 16d-16f may be done on the input ports (eig., as part of control units 
1 6a- 1 6c), rather than for the output ports. Another possibility is to use a combined buffer 
for input and output. This may correspond to pairing an input port with an output port. For 
example, input port 14a could be paired with output 14d, for the effect of a bi-directional port. 

15 FIG. 2 illustrates buffering using separate receive and transmit buffers at the same 

time. In this example, network port 24 includes both an input port (e.g., port 25a) and an 
output port (e.g., 25d). A buffer 26 is provided for the input port. A separate buffer 28 is 
provided for the output port. Information may be routed through the network switch fabric 22 
between ports, as generally described above. 

20 FIG. 3 illustrates an alternative embodiment. In this embodiment, combined receive 

and transmit buffers are shown. In this embodiment, the receive buffer 36 and transmit buffer 
are stored in a common memory 35. 

Another alternative would be to provide a receive buffer and a transmit buffer that 
include a shared memory area. Such a system is described in copending and commonly 

25 owned United States Patent Application Serial No. 08/847,344, entitled Method And 

Apparatus For Adaptive Port Buffering, filed April 24, 1997, by Steve Augusta et al., which is 
hereby incorporated by reference in its entirety. 

In many networks, all communication units are treated equally — i.e., all 
communication units are assumed to have the same priority in traveling across a network. 

30 Alternatively, various levels of quality of service ("QoS") may be provided. This has been 
applied in ATM networks, although the concept may be applied in other contexts. 
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In one example, diflFerent services offered over the network may have different 
transmission requirements. For example, video on demand may require high quality service 
(to avoid jerking movement in the video), while e-mail allows a lower quality of service. 
Subscribers may be offered the option to pay higher prices for higher levels of quality of 
5 service. 

3ummary of the Invention 

According to one embodiment of the present invention, a buffer element for a 
communication network is disclosed. A first buffer memory is provided to store 

10 communication units corresponding to a first quality of service (QoS) level. A second buffer 
memory stores communication units corresponding to a second quality of service level. A 
buffer manager is coupled to the first buffer memory and the second buffer memory. A depth 
adjuster may be provided to adjust corresponding depths of the first buffer memory and the 
second buffer memory. 

15 According to another embodiment of the present invention, a switch for a 

communication network is disclosed. The switch includes a plurality of ports, a first buffer 
memory coupled to one of the ports to store communication units corresponding to a first 
quality of service level and a second buffer memory coupled to the one of the ports to store 
communication units corresponding to a second quality of service level. 

20 According to another embodiment of the present invention, a method of buffering 

communication units in a communication network is disclosed. According to this 
embodiment, a queue depth is assigned for each of a plurality of queues, each queue being 
designated to store communication units of a predetermined quality of service level. The 
plurality of queues is provided, each having the corresponding assigned depth. One of the 

25 queues is selected to receive a communication unit, based on a quality of service level 

associated with the communication unit. The communication unit may then be stored in the 
selected queue. This embodiment may further comprise a step of adjusting queue depths. 

According to another embodiment of the present invention, a method of selecting a 
communication unit for transmission in a communication network that provides a plurality of 

30 quality of service levels is disclosed. In this embodiment, the communication unit is selected 
from a plurality of communication units stored in a buffer, the buffer including a plurality of 
queues, each queue corresponding to one of the quality of service levels. The method of this 
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embodiment includes the steps of identifying the queue with the highest corresponding quality 
of service level and which is not empty, and then selecting the communication unit from the 
identified queue. 

According to another embodiment of the present invention, a method of storing a 
5 communication unit in a buffer is disclosed. According to this embodiment, the 

communication unit has one of a plurality of quality of service levels and the buffer includes a 
plurality of queues, each queue corresponding to one of the quality of service levels. 
According to this embodiment, the method comprises steps of determining the quality of 
service level of the communication unit and storing the communication unit in the queue 
10 having the corresponding quality of service level of the communication unit. According to 
this embodiment, the communication unit may be dropped when the queue having the 
corresponding quality of service level of the communication unit is full (or alternatively 
placed in a queue for a lower quality service). 

15 Brief Description of the Drawings 

FIG. 1 illustrates one embodiment of a network switch in a communication network. 

FIG. 2 illustrates one embodiment of buffering for a switch. 

FIG. 3 illustrates another embodiment of buffering for a switch. 

FIG. 4 illustrates one embodiment of a buffer element according to the present 
20 invention. 

FIG. 5 illustrates one embodiment of a network switch according to the present 
invention. 

FIG. 6 illustrates one embodiment of a method for receiving cells using the buffering 
element illustrated in FIG. 4. 
25 FIG. 7 illustrates one embodiment of retrieving cells from a buffer element such as 

that shown in FIG. 4. 

FIG. 8 illustrates one embodiment of a method for determining depth assignments for 
a buffering element. 

FIG. 9 illustrates one embodiment of a graphical user interface for inputting queue 
30 depth assignment problems. 

FIG. 10 illustrates one embodiment of a buffer element and associated controllers for 
use in a communication network. 
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FIG. 1 1 illustrates one embodiment of a method for adjusting queue depths during use 
of the communication network. 

Detailed Description 

5 Design of a communication network (or a switch for use in a communication network) 

that supports various levels of QoS can be a difficult task. One difficulty is determining the 
quality of a particular implementation. Generally, the design of a communication network 
may pursue the following (sometimes conflicting) goals: 1) Accommodating traffic through 
the network; 2) Making efficient use of the network facilities; 3) Ensuring that network 

10 performance reflects the appropriate QoS levels. 

Two potential measures of the quality of service offered include cell loss rate (CLR) 
and cell transfer delay (CTD). CLR reflects the number of cells that are lost. For example, if 
more cells arrive at a switch than can be accommodated in the switch's buffer, some cells may 
be lost. 

1 5 CTD corresponds to the amount of time a cell spends at a switch (or other storage 

and/or transfer device) before being transmitted. For example, if a cell sits in a buffer for a 
long period of time while other (e.g., higher QoS level) cells are transmitted, the CTD of the 
delayed cell is the amount of time it spends in the buffer. 

In the embodiment described below, mean cell loss rate (CLR) and mean cell transfer 

20 delay (CTD) are used to measure the quality of service. Of course a number of variations on 
these measures as well as other measures could be used. For example, cell delay variation 
(the amount of variation in cell delay) or maximum CTD (rather than average CTD) could be 
used as alternative or additional measures. Other measures may be used instead or as well. 
FIG. 4 illustrates one embodiment of a buffer element for use in a network 

25 accommodating multiple QoS levels. A buffering mechanism 40 is provided at a switch port, 
such as the buffering element 16d at port 14d of FIG. 1. In that particular example, the 
buffering occurs at an output port 14d. In alternative embodiments, buffering may be 
associated with an input port (e.g., 14a- 14c of FIG. 1) or both input and output ports. 

In the example of FIG. 4, the buffering element 40 includes four queues (also referred 

30 to as buffers) 43a-43d. Each queue is composed of a storage component, such as a random 
access memory (or any other storage device). Each queue 43a-43d is associated with a 
particular QoS level for the network. Thus, in the example of FIG; 4, there are four QoS 
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levels. Queue 1 (43a) corresponds to the highest QoS level. Queue 2 (43b) corresponds to the 

second highest QoS level. Queue 3 (43c) corresponds to the third highest QoS level. Queue 4 

(43d) corresponds to the lowest QoS level. 

Each of the queues 43a-43d also has an associated depth. The depth corresponds to 
5 the amount of information that can be stored in the particular queue. Where incoming cells 41 

have a fixed length, the depth of the queue may be measured by the number of cells that can 

be stored in that queue. 

In Fig. 4, queue 1 (43a) has a depth Dl . Queue 2 (43b) has a depth D2. Queue 3 (43c) 

has a depth D3. Queue 4 (43d) has a depth D4. Each ofthe depths D1-D4 may be of a 
10 different size. When incoming cells 41 are directed to the port, a sorter 44 assigns the cell to 

the appropriate queue 43a-43d based on the QoS of the cell. In most cases, the QoS ofthe cell 

will be indicated in an information field within the cell itself. 

When a cell can be transmitted from the port, a merge unit 45 selects the appropriate 

cell for transmission. While the sorter 44 and merge unit 45 are shown as separate 
15 components, these may be implemented in a number of ways. For example, the sorter and 

merge unit may be separate hardware components. In another embodiment, the sorter 44 and 

merge unit 45 may be programmed on a general purpose computer coupled to the memory or 

memories storing queues 43a-43d. In another embodiment, a common merge unit is used for 

all ofthe ports (particularly where buffering is done on an input port). 
20 The queues 43a-43d may be implemented using separate memories. In the alternative, 

the queues may be implemented in a single memory unit, or shared across multiple shared 

memory units. The memory units may be conventional random access memory device or any 

other storage element, such as shift registers or other devices. 

FIG. 5 illustrates one embodiment of a switch 50 that includes buffering elements 
25 53a, 53b, 54a, 54b, 55a, 55b, 56a and 56b, similar to those illustrated in FIG. 4. The 

embodiment of FIG. 5 has four input ports 51a-51d and four output ports 52a-52d (and hence 

is a 4X4 switch ). 

In the example of FIG. 5, there are only two QoS levels. In this example, each output 
port 52a-52d has two associated queues (one for each QoS level). For example, output port 
30 52a has two associated queues 53a and 53b. Again, while this embodiment illustrates 

buffering on the output ports, buffering could instead be done on the input ports or on both 
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input and output ports. In addition, while FIG. 5 illustrates queues 53a-56b as separate 
devices, they may be stored in one, or across several, memory chips or other devices. 

FIG. 6 illustrates one embodiment of a process for receiving cells at a buffering 
element, such as receiving incoming cells 41 at buffering element 40 of FIG. 4. The process 
5 begins at a step 60 when a cell is received. At a step 6 1 , the appropriate QoS level for the cell 
is determined. This may be done, for example, by examining a field in the cell that specifies 
or otherwise indicates the QoS level. 

At a step 62, it is determined whether there is room in the appropriate QoS buffer to 
receive the cell. If so, the cell is stored in the buffer, at a step 63. If there is no room in the 
10 appropriate QoS buffer, the cell is dropped at a step 64. 

Of course, a number of variations on this process may be developed. As just one 
example, if there is no room in the appropriate QoS buffer (step 62), buffers of a lower 
priority could be examined. If there is room in a lower priority buffer, the cell could be stored 
in that buffer (additional steps may be taken when order of cell transmission is important, 
15 such as taking cells from the queue out of FIFO order). In any event, a number of variations 
and optimization maty be made to the embodiment of FIG. 6. 

FIG. 7 illustrates one embodiment of a method for retrieving cells stored in a buffering 
element, such as selecting the outgoing cells 42 of FIG. 4. 

In this particular embodiment, the top level queue is selected first (e.g., queue 43a of 
20 FIG. 4), at a step 70. 

At a step 71, it is determined whether the selected queue is empty. If so, the next 
queue is selected (at a step 73), and examined to determine if it is empty (step 71). 

Once a queue that is not empty has been found, one (or more) cell from that queue is 
transmitted at a step 72. In this particular embodiment, after a cell has been transmitted, the 
25 top level queue is again examined. Accordingly, the effect of the embodiment in FIG. 7 is to 
transmit cells from the highest level queue that is holding cells, until there are none left. 

A number of variations or alternatives are possible. For example, in the embodiment 
of FIG. 7, a cell in the lowest QoS level queue could be indefinitely frozen from transmission 
by a long stream of cells arriving for higher level QoS queues. An alternative, therefore, 
30 would be to rotate priority among the QoS levels (e.g., give the highest level QoS queue first 
priority sixty percent of the time, the second highest level priority thirty percent of the time, 
the third highest level priority ten percent of the time and the lowest QoS level priority none 
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of the time). Another alternative would be to monitor cell delay and require transmission of 
cells after a certain delay (the delay potentially depending on the QoS level). For example, 
queue 3 could be given highest priority when cells have been sitting in that queue for longer 
than a first period of time, and queue 4 given highest priority when cells have been sitting in 

5 that queue for a second period of time (in most cases, the period of time for the lower QoS 
levels will be greater than the period of time for the higher QoS levels). Again, a number of 
variations and optimizations are possible. 

In the embodiment of FIG. 7, cells are removed from the queue on a first in and first 
out ("FIFO") basis. Again, a number of alternatives are possible. For example, if a cell is in 

1 0 the highest QoS level queue, but can not be transmitted, another cell may be selected from the 
highest QoS level queue (or, in the alternative, a cell selected from the next QoS level queue). 
A cell may not be capable of transmission when, for example, the place to which it is being 
transmitted is blocked. One example of this situation occurs when the buffers appear at the 
input ports (e.g., port 1 4 a of FIG. 1 ). If another port is transmitting a cell to a particular 

15 output port (e.g., port 14d), no other cell stored at any other input port can be transmitted to 
that same port at the same time. Thus, a cell in the highest QoS level associated with port 14a 
might be blocked from transmission to port 14d by another cell being transmitted to that port. 

Referring again to FIG. 4, the buffering element has M queues, where M stands for the 
number of levels of QoS accommodated by the switch. In the example of FIG. 4, M equals 4. 

20 Referring again to FIG. 5, an AT by N switch is disclosed (in FIG. 5, N=4). Where 

buffers appear only on the output (or input), there may be a total of M x N queues in the 
switch. 

In one embodiment of the present invention, each of the queues may have a different 
depth. That is, the size of each queue may not be the same. In these embodiments, therefore, 
25 a problem may be posed of how much memory to provide for each queue, to meet system (and 
QoS) requirements. This may be referred to as a queue depth assignment problem. 

In one embodiment, the assignment of depths to each of the queues is based on 
performance and characteristic of the network and switch. The depth assignments should 
satisfy the following equation: 



30 
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E E D ij * m 



#=1 >=i 



Where m is the total memory available in the switch, is the depth of the queue at port i and 
QoS level is j. Thus, the sum of the depths of all of the queues has to be less than or equal to 
the total memory (w) available in the switch. As can be seen from this model, the depth of all 
of the highest quality level queues within the switch may, but need not, be the same. For 
example, referring again to FIG. 1 , more memory could be provided for the highest level 
queuing associated with port 14d than with port 14e. 

One way to determine queue depth is to ascertain a mathematical model for the quality 
of the queue depth assignments. The mathematical model can then be solved or used to 
evaluate possible solutions of the depth assignment problem. 

In the following example, an energy function is defined to reflect the measure of the 
quality of the potential solution of the depth assignment problem. In this example, the lower 
the energy function, the better the solution. The energy function is: 



P y is the constant penalty imposed for a dropped cell on QoS j. (For example, with three QoS 
levels, weights 1 0, 5 and 1 could be respectively assigned as the penalty for dropping a cell of 
the corresponding QoS level.) 

P 2J is the penalty imposed for a cell waiting on QoS /. (For example, with three QoS 
levels, penalties of 8, 4 and 0 could be assigned for each unit time delay of a cell having the 
corresponding QoS level.) 

P fJ is the load on port i, QoSy, which is given by p, y = A,/^, Here, X 9 is the arrival rate, 
in packets/sec, on port /, QoS 7, and \ij is the processing rate of QoS /, also in packets/sec. 

The function/; (D, p) is the cell loss probability. Therefore,/; (Z), p) A.,-, corresponds to 
the CLR. The function^ (£>, p, k) corresponds to the CTD. 

To use the above energy function, the particular variables of the equation have to be 
filled in. Values of k 0 may be determined by observing the traffic over the switch for some 
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length of time and averaging arrival rates on each queue. Of course, other methods are 
possible. 

The processing rates \x of each queue may be determined by the switch's performance 

characteristics (or observed). 
5 The penalty parameter arrays P, and P 2 may be determined subjectively by the user. 

These values represent the relative importance of minimizing each of the objective measures 

fl and f2 (e.g., CLR and CTD) for each queue. For example, if P, = (1 0, 5, 2, 0), then a 

penalty often is imposed for a lost cell on the first QoS level, a penalty of five on the second 

QoS level, a penalty of two on the third QoS level, and no penalty on the fourth QoS level. In 
10 this example, performance on the fourth QoS level will be sacrificed to improve CLRs of the 

other QoS levels. Similarly, the penalty associated with cell delay P 2 needs to be specified for 

each of the QoS levels. 

The M/M/l/K queuing model may be used to predict CLR and CTD. This model is 

discussed, for example in Kleinrock, L., Queuing Systems, Vol. J; Theory, New York, NY: 
15 John Wiley & Sons, Inc., 1975, pp. 103-5; and Fu, L., Neural Networks in Computer 

Intelligence, New York, NY: McGraw-Hill, Inc., 1994, pp. 41-5. This model assumes that 

p < 1 , where p is the load. The cell loss probability,/! , is given by 

/,<»„> =^ 

and the CTD is given by 

(A variety of other models may also be used to predict CLR and CTD. CLR and CTD may 
20 also be estimated by taking actual measurements on a system while it is performing.) 

One possible approach to solving for minimum E is to examine all possible depth 
assignments. As is typical of combinatorial problems of this nature, however, the cost of 
exhaustive search grows factorially. The number of feasible solutions is equal to 



25 
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Table 1 below illustrates a few examples to show the growth of this function. 



Table 1. 



m 


NM 


number of 
possible 
solutions 


30 


10 


l.OOxlO 7 


30 


15 


7.76x1 0 7 


40 


10 


2.1 2x1 0» 


40 


20 


6.89xl0 10 


100 


10 


1.73xl0 12 


100 


25 


6.06x1 0 22 


100 


50 


5.04x1 0 28 



Under certain embodiments of the present invention, alternative methods may be used 
15 to find optimal (or, hopefully, close to optimal) solutions. Thus, neural-networks, genetic 
algorithms and other approaches may be used. 

In one embodiment of the present invention, a straightforward genetic algorithm is 
used to solve the above energy function. According to this method, an initial solution is 
started with. This initial solution can be any random solution, or may be selected intelligently 
20 as discussed below. 

The genetic algorithm then uses a mutation operator that may consist of picking a 
random port, subtracting a random number from a randomly selected queue on that port and 
adding that same number to another randomly selected queue depth on the same port. Simple 
single point cross over may be used to combine solutions. In each generation of the genetic 
25 algorithm, an elite percentage of the population is preserved and used to reproduce the 

remainder of the population using cross over. Half of the offspring may further be mutated a 
number of times. 

In an alternative embodiment, steepest ascent (or descent - they are the same) hill- 
climbing (SAHC) may be used. This algorithm (in certain environments) may produce 
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similar results to that of the genetic algorithm, although in considerably shorter time in certain 
applications. 

Using steepest descent hill-climbing, a local minimum solution can be found by 
following the steepest path down the energy surface - following search paths that provide the 
5 greatest decreases in the energy function. 

The steepest descent hill-climbing approach may be modified to include random 
jumps. This would permit the algorithm to jump over small "hills" on the energy function 
surface. This process employs the technique called simulated annealing, known in the art. 
The hill-climbing may be achieved by systematically (rather than randomly) 
10 incrementing each D, y by one and at the same time reducing the depth of a randomly selected 
queue by one (thus keeping the total memory usage constant and equal to m\ The energy 
function of each potential solution may be evaluated and the best set of queue depths saved. 

For each of the above, an intelligent initial solution can improve the results and/or 
reduce the amount of time required to achieve a good solution. In one embodiment, the 
15 solution is initialized to have queue depths of D, y proportional to p,-, (P }J + P 2J ) and summing to 
exactly m. 

Thus, FIG. 8 illustrates one embodiment of a method for finding a solution to the 
queue depth assignment problem. This embodiment begins at a step 80, where an initial 
solution is formed. This solution may be formed as described above, assuming that depths D 0 

20 are proportional to p fJ (P fJ + P 2j ) and sum to exactly /w. 

At a step 88, the current best solution is mutated to determine if a better potential 
solution may be found. The possible solutions are generated at step 88. For each of the 
queues at the switch (the queue having an associated depth the applicable A, is decreased 
by one. In addition, a randomly selected queue depth D xy is incremented by one. This forms a 

25 new potential solution - moving one storage element from a current existing queue to a new 
queue. By both decrementing and adding one, the total memory for the switch remains the 
same. (Here, the adding and subtracting of one corresponds to adding and subtracting 
sufficient storage to accommodate one additional cell). 

After the new possible solution is generated, its energy function may be evaluated. If 

30 this is the best energy function encountered so far, this solution is saved and used for the next 
iteration (the next time step 88 is performed). Otherwise processing simply continues and the 
current solution remains the best one encountered so far. Optionally, in the event of a tie, the 
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newly generated solution is selected. After examining a variety of potential solutions, at step 
88, it is determined whether the algorithm has improved the best solution encountered so far 
at any point in the last (for example) twenty iterations (twenty times passing through step 88); 
If not the current best solution is taken as the solution to the queue depth problem. If so, the 

5 solution has not been stable for the last twenty iterations - processing continues by returning 
to step 88 (using the current best solution). 

FIG. 9 illustrates one embodiment of a graphical user interface that may be used for 
solving a queue depth assignment problem. In this particular embodiment, the interface 90 
includes an input area 91 and a help area 92. The help area 92 provides a scrollable help 

10 document. 

As illustrated at 91, the following fields may be input to frame the queue depth 
assignment problem. A number of switches in the network may be input, as shown at 91a, 
where more than one switch may be present in the switch fabric. 

At 91 b, a user may input the number of input and output ports on each switch (N). At 
15 9 1 c, the user may input the number of QoS levels supported by the switch. At 9 1 b, the user 
may input the total memory available on each switch. (In this embodiment, the input is in 
terms of the number of cells that can be stored in all of the buffers on the switch.) 

At 91e, the user may input the penalty for losing a cell on each QoS level. In the 
example illustrated in FIG. 9, there are two QoS levels (as shown at 91c). Accordingly, two 
20 different entries need to be made at 91 e - one for each QoS level. 

Similarly, at 91f, the user inputs the penalties for cell delay on each QoS. As above, 
the number of entries may correspond to the number of QoS levels (again indicated at 91c). 

At 91g, the processing rates for each quality of service level are input. Finally, at 
91h, the arrival rates (X) for each queue on every switch are input. Thus, in this example, 
25 eight entries need to be made - one for each of the two queues on each of the for output ports. 

Tables 2 and 3 below show examples of application of the algorithm of FIG. 8 to the 
following queue depth assignment problems. Values for A. were determined by two different 
methods to stimulate mean and maximum load measures. In Table 2, A. values were 
determined by taking the mean of five random numbers. In Table 3, k values are the 
30 maximum of five random numbers. In both cases, the constraint X y < \ij is enforced. 

In all experiments, the number of QoS levels, M= 4, P t = (10, 5, 2, 1), and P 2 = (8, 4, 
0, 0). Values of \i were 100, 60, 30, 15. The Percent Improvement columns show the 
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improvement over the initial solution (framed using the intelligent solution described above) 
in each QoS measure for each QoS level. CLRs and CTDs are averaged for each QoS, and are 
listed in order of QoS level. 
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As shown in Tables 2 and 3, the new solution is not always superior to the initial 

5 solution in all respects. Specifically, the CTD is often worse in the final solution than 
initially. However, the overall goodness of the solution has improved — some aspects of 
performance have been sacrificed in order to provide improved measures of aspects deemed 
more important. In these experiments, CTD was given a comparatively lower priority than 
CLR, resulting in decreased levels of performance in the CTD measure. 

10 Some of the percentage improvements listed are extremely large in magnitude. These 

values can be misleading, since the initial quantity may be small. Therefore, even though the 
percentage is large, the absolute change may be of only marginal significance. 

A number of problems were also solved by exhaustive search in order to objectively 
determine optimal solutions for comparison to the SAHC solutions. In every case, the SAHC 

15 algorithm found an optimal solution. The problems sizes were necessarily very small, on the 
order of 10 6 to 10 7 . It should be noted, however, that exhaustive search on even these small 
problems took hours of computation running on a Silicon Graphics Indigo 2 workstation, 
while the SAHC method was able to arrive at the same solutions in less than one second. 

In the above examples, it is assumed that memory could be allocated across all of the 

20 buffers in the network. This works well for initial system design. 

In an existing system, however, the buffering memories may not be easily reallocated 
between ports. Referring again to FIG. 1, each of the buffering components 16d-16f are 
connected to a respective port. After the switch has been designed and built, it may not be 
convenient to move memory from one of the buffering elements (e.g., 16d) to another 

25 buffering element (e.g., 1 6e). Where this is the case, it may still be possible to optimize queue 
depths within the individual buffering elements even after the switch has been constructed, 
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without a shared pool of memory for all buffers on the switch. For example, if each of the 
queues 43a-43d (of FIG. 4) are stored in a common memory, the amount of memory allocated 
to each of the buffers may be dynamically changed easily. The technique for assigning 
queues may be the same as that described above, except that fewer queues are analyzed. 

5 FIG. 1 0 illustrates one embodiment of a buffering unit according to one embodiment 

of the present invention, such as the buffering unit I6d of FIG. 1 . In this embodiment, a 
fabric interface controller 1 02 handles reception of cells from the network switch fabric 100 
(in 16d of FIG. 1, this would correspond to reception of cells from the network switch fabric 
12). The fabric interface controller may provide cells to the output queue buffers 103 at the 

1 0 direction of a buffer controller 1 06. Similar to the fabric interface controller 1 02, a port 

interface controller 104 handles transmission or reception of cells from the port 105. Both the 
fabric interface controller 102 and the port interface controller 104 may be implemented as off 
the shelf devices, or may be integrated into an application specific integrated circuit (ASIC) 
that includes all or part of the components shown in FIG. 10. 

15 The output queue buffers 103 may be a single dedicated memory device, several 

memory devices, registers, or a portion of a total memory space used within the switch. As 
described above, the latter most easily permits assignment and re-aligning of memory among 
buffering components associated with individual ports, whereas other embodiments may not 
as easily accommodate this. 

20 In one embodiment, the buffer controller 1 06 performs the control functions of FIGs. 

6-8. This may be done by responding to requests from the fabric interface controller 102 and 
the port interface controller 1 04 and controlling the output queue buffers 143 accordingly. In 
other embodiments, either or both of the fabric interface controller 102 and port interface 
controller 104 perform some or all of these control functions (as illustrated in FIG. 4), so that 

25 a buffer controller 106 is not necessary. In another embodiment, the buffer controller 106 
performs the functions of the fabric interface controller 102 and port interface controller 104 
The above embodiments also permit dynamic monitoring of network characteristics 
for the switch or port, and reassignment of queue depths on the fly. 

FIG. 11 illustrates one embodiment of this process. According to this embodiment, 

30 queue depths are assigned at a step 110. This may be done initially as described above, by 
making assumptions or estimates about network characteristics. 

At a step 1 1 2, the network characteristics are monitored. These characteristics may 
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correspond to whatever aspects affect the energy function used in the particular embodiment. 
For example, in the embodiments described above, mean cell arrival rates (A.), cell drop rates, 
cell delay rates, average throughput, etc. may be measured. This monitoring may be done by 
the buffer controller, separate monitoring module, a network controller or other mechanism. 
5 Periodically, the queue depths may be reassigned, by returning to step 110. This may 

be done at fixed periods of time (e.g., once a day), or may be done whenever a change in 
network characteristics is sensed. By logging the network characteristics, a schedule of queue 
depths may be created. This may be useful where the characteristics of the network vary over 
time (e.g., where network characteristics in the evening are different than network 

1 0 characteristics in the morning). 

The process of assigning queue depths 110 may be performed by buffer controllers, as 
described above with reference to FIG. 10. Even where all of the buffers are held in a 
common memory and queue depths may be reassigned by sharing memory across more than 
one port, one or more buffer controllers may be responsible for assigning queue depths. In 

1 5 alternative embodiments, a separate processor may be provided for performing or 

coordinating the queue depth assignment problem, or this process may be performed by a 
network controller or other facility. 

The various methods above may be implemented as software on a floppy disk, 
compact disk, or other storage device, which controls a computer. The computer may be a 

20 general purpose computer such as a work station, main frame or personal computer, that 

performs the steps of the disclosed processes or implements equivalents to the disclosed block 
diagrams. Such a computer typically includes a central processing unit coupled to a random 
access memory and a program memory by a data bus of some form. The data bus may also be 
coupled to the output queue. The buffer controller 1 06 may, for example, perform these 

25 functions and be implemented in this manager. Alternatively, the various methods may be 
implemented in hardware such on an ASIC or other hardware implementation. Of course, in 
either hardware or software embodiments, functions performed by the above elements and the 
varying steps may be combined in varying arrangements of hardware and software. 

Having thus described at least one illustrative embodiment of the invention, various 

30 modifications and improvements will readily occur to those skilled in the art and are intended 
to be within the scope of the invention. Accordingly, the foregoing description is by way of 
example only and is not intended as limiting. The invention is limited only as defined in the 
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following claims and the equivalents thereto. 
What is claimed is: 



1 
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CLAIMS 



1 . A buffer element for a communication network, the buffer element comprising: 

a first buffer memory to store communication units corresponding to a first quality of 
5 service level; 

a second buffer memory to store communication units corresponding to a second 
quality of service level; and 

a buffer manager, coupled to the first buffer memory and the second buffer memory, to 
selectively store communication units in the first buffer and the second buffer based on a 
1 0 corresponding quality of service level of the communication units, and to retrieve 
communication units from the first buffer memory and the second buffer memory. 

2. The buffer element of claim 1, wherein the buffer manager comprises: 

a sorter unit coupled to the first buffer memory and the second buffer memory to 
1 5 selectively store a communication unit in the first buffer or the second buffer based on a 
quality of service level of the communication unit. 



3. The buffer element of claim 1 , wherein the first buffer memory has a first depth, the 
second buffer memory has a second depth, and the buffer element further comprises: 

20 a depth adjuster to adjust the first depth and the second depth. 

4. The buffer element of claim 3, wherein the depth adjuster comprises: 

means for iteratively searching possible depth assignments to determine the first depth 
and the second depth. 

25 

5. The buffer element of claim 4, wherein the means for searching comprises: 
means for performing a steepest ascent hill climbing search. 



6. 

30 



The buffer element of claim 3, wherein the depth adjuster comprises: 
means for determining performance characteristics of the switch. 
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7. The buffer element of claim 1 , wherein the first buffer memory and the second buffer 
memory are regions of memory in a contiguous random access memory device. 

8. The buffer element of claim 1, wherein the communication units are ATM cells. 

5 

9. A switch for a communication network, the switch comprising: 
a plurality of ports; 

a first buffer memory coupled to one of the ports to store communication units 
corresponding to a first quality of service level; and 
10 a second buffer memory coupled to the one of the ports to store communication units 

corresponding to a second quality of service level. 

1 0. The switch of claim 9, further comprising: 

a buffer manager, coupled to the first buffer memory and the second buffer memory, to 
15 selectively store communication units in the first buffer and the second buffer based on a 
corresponding quality of service level of the communication units, and to retrieve 
communication units from the first buffer memory and the second buffer memory. 

11. The switch of claim 9, wherein: 

20 the plurality of ports comprises a plurality of output ports that output communication 

units from the switch to the network; and 

the first buffer memory and the second buffer memory are coupled to one of the 
plurality of output ports, to store communication units to be output to the one of the plurality 
of output ports. 

25 

12. The switch of claim 1 1 , wherein: 

each of the plurality of output ports has a respective first buffer memory and a 
respective second buffer memory to store communication units transmitted across the 
respective output port. 
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13. The switch of claim 12, wherein: 

each of the plurality of output ports has a respective buffer manager to selectively 
store communication units in the respective first buffer and the respective second buffer based 
on a corresponding quality of service level of the communication units, and to retrieve 
5 communication units from the respective first buffer memory and the respective second buffer 
memory. 

1 4. The switch of claim 9, wherein: 

the plurality of ports comprises a plurality of input ports that receive communication 
10 units from the switch to the network; and 

the first buffer memory and the second buffer memory are coupled to one of the 
plurality of input ports, to store communication units received on the one of the plurality of 
input ports. 

15 15. The switch of claim 14, wherein: 

each of the plurality of input ports has a respective first buffer memory and a 
respective second buffer memory to store communication units transmitted across the 
respective input port. 

20 1 6. The switch of claim 1 5, wherein: 

each of the plurality of input ports has a respective buffer manager to selectively store 
communication units in the respective first buffer and the respective second buffer based on a 
corresponding quality of service level of the communication unit, and to retrieve 
communication units from the respective first buffer memory and the respective second buffer 

25 memory. 

1 7. The switch of claim 1 5, wherein the communication units are ATM cells. 

18. A method buffering communication units in a communication network, the method 
30 comprising steps of: 

assigning a queue depth for each of a plurality of queues, each queue being designated 
to store communication units of a predetermined quality of service level; 
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providing the plurality of queues, each queue having the corresponding assigned 

depth; 

selecting one of the queues to receive a communication unit based on a quality of 
service level associated with the communication unit; and 
5 storing the communication unit in the selected queue. 

1 9. The method of claim 1 8, further comprising a step of adjusting the queue depths. 

20. The method of claim 18, further comprising steps of: 

10 monitoring a characteristic in the communication network; and 

adjusting the assigned queue depths based on the monitored characteristic. 

21. The method of claim 20, wherein the characteristic is selected from the group 
consisting of communication unit arrival rate for one of the quality of service levels, 

15 communication unit processing rate for one of the quality of service levels, communication 
unit loss rate for one of the quality of service levels and communication unit delay rate for one 
of the quality of service levels. 



22. The method of claim 1 8, wherein each of the plurality of queues stores communication 
20 units for a single port in a communication network switch. 

23. The method of claim 22, wherein the single port is an output port. 

24. The method of claim 1 8, wherein the plurality of queues stores the communication 
25 units for each port of a switch in the communication network. 

25. The method of claim 1 8, wherein the assigning step comprises a step of: 
determining a priority level for dropped communication units for each of the quality of 

service levels. 



30 
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26. The method of claim 1 8, wherein the assigning step comprises a step of: 

assigning a priority level for communication unit delay for each of the quality of 
service levels. 

5 27. The method of claim 1 8, wherein the assigning step comprises a step of: 
performing a search of possible depth assignments. 

28. The method of claim 27, wherein the performing step comprises a step of: 
performing a steepest ascent hill climbing search. 

10 

29. The method of claim 1 8, wherein the communication units are ATM cells. 



30. A method of selecting a communication unit, for transmission in a communication 
network that provides a plurality of quality of service levels, the communication unit being 

15 selected from a plurality of communication units stored in a buffer, the buffer including a 
plurality of queues, each queue corresponding to one of the quality of service levels, the 
method comprising steps of: 

identifying the queue with the highest corresponding quality of service level and 
which is not empty; and 

20 selecting the communication unit from the identified queue. 

31 . A method of storing a communication unit in a buffer, the communication unit having 
one of a plurality of quality of service levels, the buffer including a plurality of queues, each 
queue corresponding to one of the quality of service levels, the method comprising steps of: 

25 determining the quality of service level of the communication unit; and 

storing the communication unit in the queue having the corresponding quality of 
service level of the communication unit. 



30 



32. The method of claim 3 1 , further comprising a step of: 

dropping the communication unit when the queue having the quality of service level of 
the communication unit is full. 
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